Code Name : Vale

Project title : Economic Inequality and its Dynamics in the United States

Authors:

Chun Hin Matthew So -

Sonia Yeh -

Atreya Bhamidi -

Johnny Carroll -

Keywords: Economy, Inequality, Inequity, Wealth Distribution, Income Distribution, Race, Education, Age

Affiliation: INFO-201: Technical Foundations of Informatics - The Information School - University of Washington

Date: Winter 2022

Abstract

The COVID-19 pandemic has led to an unprecedented increase in economic inequality in the US. This broadening disparity underlies several social issues plaguing our country, and understanding it could hold the key to the well-being of many groups of people. To analyze trends in and formulate solutions to this inequality, we will be looking at a group of datasets containing information on wealth distribution in the US.

1.0 Intro

Throughout the past decade, economic inequality has intensified to levels never seen before. While there are a variety of potential contributing factors - technological change, globalization, and unfair governmental policy shifts, among many others - financial crises such as the one brought by the COVID-19 pandemic have served to accelerate the widening of the gap between the poor and the rich. This is cause for great concern; with large sections of society being left behind with very little economic opportunity and mobility, countrywide economic growth is negatively impacted, political polarization is deepened, and communal tensions are stoked.

It is thus imperative that socially conscious data scientists analyze the issue of economic inequality. One common way to do so is to collect data regarding distribution of wealth which measures the wealth of a nation including real estate, consumer durables, private businesses, and other financial assets that are distributed among the population. In our project, we look at datasets containing information on wealth distribution by generation, race, education, and income strata. We hope to analyze these datasets to obtain trends in economic inequality over time and identify potential actionable solutions.

2.0 Design situation

3.0 Research questions :

Economic inequality is at the root of many social issues. By looking into such inequality, we are able to examine people’s quality of life and predict determinants of the health and well-being of individuals and families. Looking at how economic well-being in the United States varies with social factors such as age/generation, race/ethnicity, income levels, and education will give us a glimpse into the stratification of American society and a basis to formulate potential solutions to this pressing problem. Some guiding questions that we will hope to answer are as follows:

  1. How has the distribution of wealth in the United States changed over the last decade?

  2. How does the distribution of wealth vary with an individual’s level of education? What are the education dynamics observed from the dataset?

  3. How does the distribution of wealth vary with race? What are the racial or ethnic dynamics observed from the dataset?

  4. How has the COVID-19 pandemic affected wealth and income distribution? How do these observed dynamics vary between the start (2019/2020) and the middle (2021) of the pandemic?

As a first look at some of these questions, we have made a few charts to help visualize the distribution of wealth across the various factors using data from each of our datasets.

The clustered bar chart shows the differences in the distribution of the data. Since the data for the chart contains one categorical and one numerical value, our group used a cluster bar chart to demonstrate the wealth distribution before and during the pandemic in different ethnicity to illustrate inequality. Besides, we decide to integrate interactive elements into the graph for better clarity. For example, when the user selects a section of the chart by left-clicking, it will automatically zoom into that section of the chart. After that, it will automatically re-scale itself by clicking the auto-scale button on the top right corner. The user can gain information when hovering over the bar. According to the chart, White Americans’ wealth has increased steadily before and during the pandemic. However, minorities’ wealth has remained mainly unchanged. Besides, White Americans’ wealth is at least six times more than any minorities’ wealth. Nonetheless, the chart provides evidence that correlates wealth inequality to minorities during the pandemic.

The pie charts show changes in wealth distribution by educational level from before and during the pandemic. Pie charts are simple but effective in displaying variables as percentage shares, and thus these charts provide valuable information about the concentration of wealth. The charts show an increase in percentage share for college educated individuals and a decrease in percentage share for non-high school educated students. The charts also show that college educated individuals control the vast majority of wealth (over 70%).

The line chart shows the change of wealth distribution from different percentile of income groups during and before the pandemic. Line chart makes it easy and clear to compare the different percentages of assets, also provides a great insight of the income gap between the groups. Further more, the graph shows that the highest income group has a income change that is 5 times greater thaan the lowest group. We could also tell that the lower classes experienced a greater impact by the pandemic since the distribution of wealth did not vary a lot for them compared to the higher percentiles.

4.0 The Dataset

The primary dataset, titled Distributional Financial Accounts (DFA) - Income Levels, consists of 14 variables/attributes (columns) and 517 observations (rows). The secondary datasets, DFA - Net Worth Levels, Race Levels, and Education Levels, are identical in dimensions and complexity. Each of these datasets contains the same variables but varies in observations based on the entries in the second column, “Category.” Thus, there is a potential to study the intersection of some of these categories through merging or mutating the datasets.

Here is a table showing a quick summary of information provided by the primary data set, DFA - Income Levels.

The first two entries show the number of of observations (rows) and variables (columns) in the dataset, respectively. The next three entries describe the difference in percentage of net worth distribution between the Top 1% and Bottom 20% of income earners in three key years: 2021 (during the pandemic), 2018 (before the pandemic), and 1989 (the earliest year for which data is available). These show the concentration of the total share of wealth increasing slightly between the pre-pandemic and pandemic eras, and a massive increase since 1989.

Entries 6, 7, and 8 show the difference between the mean and median net worth of all income groups within the same three key years as above. If the mean is significantly higher than the median, it shows a disproportionate concentration of wealth in the higher end of the income scale. The net worth of the 40-60% income group is used as a representative value for median net worth.

The last entry describes the ratio of liability burden, which we define as the ratio of liabilities to assets, between the top 1% and bottom 20% income earners. This indicates that the bottom 20% faces 4 times the liability burden as the top 1%.

Data Provenance

  • Who or what is represented in the data?

    The data paints a picture of the distribution of household wealth in the United States since 1989. It represents the household wealth statistics of 6500 families surveyed as part of the triennial Survey of Consumer Finances (and other families like them).

  • What is an observation? What variables are included?

    Each observation contains data on net worth by category and fiscal quarter/year. Categories follow the titles of each of the datasets: the Networth Levels dataset contains categories “Top1,” “Next9,” “Next40,” and “Bottom50,” dividing households into groups based on percentiles of net worth. Similarly, the other datasets have categories dividing households into groups based on race, education levels, and income levels.

  • Each dataset has 14 variables:

    1. Date: Year and Fiscal Quarter (ex: 1994:Q3)
    2. Category: Varies by dataset (ex: Bottom50, Asian)
    3. Net Worth: Assets - Liabilities, Dollars
    4. Assets: Dollars, with 6 subsections as additional variables
    5. Real Estate: Dollars
    6. Consumer Durables: Dollars
    7. Corporate Equities and mutual fund shares: Dollars
    8. Pension Entitlements: Dollars
    9. Private businesses: Dollars
    10. Other Assets: Dollars
    11. Liabilities: Dollars, with 3 subsections as additional variables
    12. Home Mortgages: Dollars
    13. Consumer Credit: Dollars
    14. Other liabilities: Dollars
  • Who collected the data? How was the data collection effort funded?

    The data was collected by the National Opinion Research Center (NORC) at the University of Chicago, and is sponsored by the Federal Reserve Board in cooperation with the Department of the Treasury. Data from the SCF, is used by a multitude of organizations, from analysis by the Federal Reserve and news organizations to research by universities and economic research centers. More information the source of the data and the SCF can be found at this link.

  • How was the data validated and held secure? Is it credible and trustworthy?

    The SCF website describes in detail their policies regarding data confidentiality and security. Per their website they utilize a multi-tiered approach to manage issues associated with computer and data security. Personally Identifiable Information (PII) is used only to contact survey respondents and is fully anonymized before providing the data to the SCF sponsor, the Federal Reserve Board. In their surveys, the NORC is in compliance with several federal regulations on information security and management, primarily the Federal Information Security Management Act (FISMA) and many others.

    The NORC and Federal Reserve make concerted efforts to ensure that the study is representative of households from all economic strata. Households are randomly selected as per guidelines laid out in working papers on the Federal reserve website with the intention to represent the full range of households in the United States. As per the Federal Reserve, “to maintain the scientific validity of the study, interviewers are not allowed to substitute respondents for families that do not participate. Thus, if a family declines to participate, it means that families like theirs may not be represented clearly in national discussions.” Thus, the representativeness of the data is impacted by households that do not participate, but on the whole, there seem to be extensive measures taken to safeguard the credibility, security, and validity of the data.

  • How did you obtain the data?

    Through a search for federal data on economic inequity, we came upon the Federal Reserve’s interactive data visualization titled “Distribution of Household Wealth in the U.S. since 1989” found at this link.

    The visualization offered publicly downloadable datasets on each of the categories mentioned above.

For a brief glimpse at the Income dataset, here is a table with several entries from the three key years we’ve identified above, broken down by quarter.

## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:purrr':
## 
##     transpose
##        Date   Category Net.worth   Assets Liabilities
##  1: 1989:Q3 pct99to100   3506890  3649271      142381
##  2: 1989:Q3  pct80to99   8885419 10366937     1481518
##  3: 1989:Q3  pct60to80   3404719  4252640      847921
##  4: 1989:Q3  pct40to60   2505940  2931696      425755
##  5: 1989:Q3  pct20to40   1517273  1706702      189429
##  6: 1989:Q3  pct00to20    586690   650120       63430
##  7: 1989:Q4 pct99to100   3594228  3742691      148463
##  8: 1989:Q4  pct80to99   9089680 10591548     1501868
##  9: 1989:Q4  pct60to80   3469967  4338513      868546
## 10: 1989:Q4  pct40to60   2521558  2962965      441406
## 11: 1989:Q4  pct20to40   1523061  1722289      199228
## 12: 1989:Q4  pct00to20    593243   658812       65569
## 13: 2018:Q1 pct99to100  24359089 25535251     1176162
## 14: 2018:Q1  pct80to99  45462890 51742344     6279454
## 15: 2018:Q1  pct60to80  14444065 17847182     3403116
## 16: 2018:Q1  pct40to60   7495506  9495067     1999561
## 17: 2018:Q1  pct20to40   4139637  5302198     1162561
## 18: 2018:Q1  pct00to20   2540623  3187612      646989
## 19: 2018:Q2 pct99to100  24701330 25892076     1190746
## 20: 2018:Q2  pct80to99  45960520 52273088     6312567
## 21: 2018:Q2  pct60to80  14647876 18084169     3436293
## 22: 2018:Q2  pct40to60   7585712  9614280     2028568
## 23: 2018:Q2  pct20to40   4185822  5371946     1186123
## 24: 2018:Q2  pct00to20   2691857  3337163      645307
## 25: 2018:Q3 pct99to100  25324357 26527939     1203582
## 26: 2018:Q3  pct80to99  46513144 52877710     6364566
## 27: 2018:Q3  pct60to80  15065813 18549660     3483847
## 28: 2018:Q3  pct40to60   7690985  9755316     2064331
## 29: 2018:Q3  pct20to40   4271520  5482473     1210954
## 30: 2018:Q3  pct00to20   2818463  3463952      645489
## 31: 2018:Q4 pct99to100  23597451 24804284     1206833
## 32: 2018:Q4  pct80to99  45244893 51616549     6371656
## 33: 2018:Q4  pct60to80  14415887 17945704     3529817
## 34: 2018:Q4  pct40to60   7678196  9780885     2102689
## 35: 2018:Q4  pct20to40   4243286  5483937     1240651
## 36: 2018:Q4  pct00to20   2910422  3555114      644692
## 37: 2021:Q1 pct99to100  34161595 35586766     1425171
## 38: 2021:Q1  pct80to99  56771415 63780593     7009178
## 39: 2021:Q1  pct60to80  19952567 23653209     3700642
## 40: 2021:Q1  pct40to60   9357458 11595521     2238063
## 41: 2021:Q1  pct20to40   5163925  6511076     1347151
## 42: 2021:Q1  pct00to20   3406337  4098531      692194
## 43: 2021:Q2 pct99to100  36356825 37804671     1447845
## 44: 2021:Q2  pct80to99  58788449 65934906     7146457
## 45: 2021:Q2  pct60to80  20855077 24647834     3792756
## 46: 2021:Q2  pct40to60   9700928 11989989     2289061
## 47: 2021:Q2  pct20to40   5399293  6776392     1377099
## 48: 2021:Q2  pct00to20   3633057  4332045      698988
## 49: 2021:Q3 pct99to100  36806136 38261873     1455737
## 50: 2021:Q3  pct80to99  59587088 66845685     7258597
## 51: 2021:Q3  pct60to80  21068915 24953583     3884668
## 52: 2021:Q3  pct40to60   9911057 12252028     2340971
## 53: 2021:Q3  pct20to40   5585346  6988641     1403295
## 54: 2021:Q3  pct00to20   3932744  4646139      713395
##        Date   Category Net.worth   Assets Liabilities

Here are some aggregate datapoints, taking the mean for three variables - net worth, assets, and liabilities - for each income percentile group.

##     Category Net.worth   Assets Liabilities
## 1  pct00to20   1739245  2147172    407926.5
## 2  pct20to40   3070031  3815979    745947.6
## 3  pct40to60   5066966  6468809   1401842.7
## 4  pct60to80   9071412 11466366   2394953.6
## 5  pct80to99  25668496 29972896   4304400.6
## 6 pct99to100  12786803 13468352    681548.7

5.0 Expected implications:

After analyzing the dataset and reflecting on the research questions, the data correlates between race and income level; we expect to see a generally increasing trend of wealth inequality across all categories over the last few years. Specifically, we expect to see that racial minorities have experienced more unfair wealth distribution during the COVID-19 pandemic. We hope to explore some of the intersectional and cross-cutting trends of inequality through our analysis, which will inform our efforts to formulate potential solutions. Based on our current assessment of the sociopolitical landscape in the US, we have identified some possible avenues for economic inequality reduction that might warrant consideration.

Primarily, the rapid, pervasive digitization of the world economy induced by the pandemic has necessitated high-speed internet and computers for nearly all learning and working environments. However, access to such technology is dictated to a great extent by economic well-being, and is very limited for people in lower wealth strata. Therefore, technologists have the prime opportunity to improve technological equity through widening high-speed internet reach and programs such as renting out affordable computers near low-income areas. Educators, designers, and technologists can collaboratively utilize this broadened network to design and provide educational content to enable individuals from lower socioeconomic backgrounds to learn new skills, thus improving social mobility. Policymakers can use wealth distribution data to inform their processes of developing social programs such as such as subsidies toward affordable technology, rent and eviction moratoriums, public transportation systems, and unemployment assistance. Importantly, analyses of wealth distribution would give policymakers an important reference point when redesigning taxation systems in the US.

6.0 Limitations:

Although the graphs provided by the Federal Reserve at the above-mentioned website are a practical tool for data analysis and inferring conclusions, there are some limitations to consider when examining the data. Firstly, the website mentions the sample size for the data is 6500 families, but the population for the United States is 329.5 million. While these families have been chosen to form a cross-section of American society as a whole, our team cannot know to what extent the sample truly represents the population. It is crucial when analyzing the data because they reduce our biases. Besides, the graph does not deliberately mention income inequality during the pandemic and the factors that influence the situation. Our team has to infer from the chart by comparing different years and reading several articles to conclude the various contributing factors to economic inequality. There is always the possibility that our team might unknowingly include our biases and prior knowledge when interpreting the data, thus unaware of other possible factors and confounding variables that have affected the data. Nonetheless, the datasets are an effective tool to understand the income and wealth distribution in the US.

References: